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Abstract: A general procedure of average-case performance evaluation for population dynamics such as genetic algo- 
rithms (GAs) is proposed and its validity is numerically examined. We introduce a learning algorithm of Gibbs 
distributions from training sets which are gene configurations (strings) generated by GA in order to figure out 
the statistical properties of GA from the view point of thermodynamics. The learning algorithm is constructed 
by means of minimization of the Kullback-Leibler information between a parametric Gibbs distribution and 
the empirical distribution of gene configurations. The formulation is applied to the solvable probabilistic 
models having multi-valley energy landscapes, namely, the spin glass chain and the Sherrington-Kirkpatrick 
model. By using computer simulations, we discuss the asymptotic behaviour of the effective temperature 
scheduling and the residual energy induced by the GA dynamics. 

Keywords: Genetic algorithms, Evolutionary optimization, Machine learning, Population dynamics, Thermodynamics, 
Average-case performance, Spin glass model, Statistical physics 
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1 Introduction 

Genetic Algorithm (GA) is a heuristics to find the 
best possible solution for combinatorial optimization 
problems and it is based on several relevant operators 
such as selection, crossover and mutation on the gene 
configurations (strings) leading to transition from one 
state to the others. This probabilistic algorithm was 
firstly introduced in the book Adaptation in Natu- 
ral and Artificial Systems by John Holland in 1975 
flH.Holland, 1975| l and now it has been widely used 
in various research fields and established as one of the 
effective algorithms to find the solution within reason- 
able computational time (E.Goldberg, 1989). 

However, it does not mean that the GA can be au- 
tomatically applied to any problem and can find the 
candidates of the solutions immediately. We should 
choose the suitable information representation of each 
gene (member) in population (ensemble) for a given 
problem and the parameters which control the opera- 
tions should be set to the optimal values. Basically, 
these operations except for the selection, which is de- 
pendent on the fitness, are defined as procedures to 
generate the states randomly and the operations do not 
always confirm to increase the fitness value. To make 
matter worse, there are few mathematical justification 
for the GA to make the system to convergence to one 
of the best possible solution. 

From this fact in mind, in this paper, in order to 
figure out the statistical properties of GA from the 
view point of thermodynamics, we introduce a learn- 
ing algorithm of Gibbs distributions from training 



sets which are gene configurations generated by GA. 
A procedure of average-case performance evaluation 
for genetic algorithms is examined. The learning 
algorithm is constructed by means of minimization 
of the Kullback-Leibler information between a 
parametric Gibbs distribution and the empirical 
distribution of gene configurations. The formulation 
is applied to the solvable probabilistic models having 
multi-valley energy landscapes, namely, the spin 
glass chain ( 10719811 |H.Chen and K.Ma, 1982) 
and the Sherrington-Kirkpatrick model 
(Sherrington and Kirkpatrick, 1975) in statistical 
physics. By using computer simulations, we discuss 
the asymptotic behaviour of the effective temperature 
scheduling and the residual energy induced by the GA 
dynamics. We also reveal the operator-dependence of 
the behaviour. 

As well-known, a study focusing on the distri- 
bution of gene configurations in GA itself is not a 
bran-new approach. Actually, the so-called Estima- 



tion of Distribution Algorithm (EDA) (Baluja, 1994 



Pelikan et al., 1999 



Pelikan et al., 2002 



IPelikan et a~ 2000 
|L.ShapnoT2005 



L.Shapiro, 20"06l S.Correa and L.Shapiro, 2006) 



is a well-known and established approach to find the 
best possible solution by estimating the distribution 
of gene configuration during the GA dynamics and 
one can use the distribution to produce the genes in 
the next generation. In fact, a lots of such studies 
have been done for various problems. 

For instance, Prugel-Bennett and 

Shapiro ( |Prugel-Bennett and L.Shapiro, 1994; 



Prugel-Bennett and L.Shapiro, 1997 ) evaluated the 
time evolution of the cummulants of distributions 
and discussed the statistical properties of GA from 
the dynamical point of view. Suzuki (Suz uki, 19951 
ISuzuki, 19981 ISuzuki, 2005| ) represented the relation- 
ship between the gene configurations by graphical 
models and estimated the joint probability or the 
marginal probability of the genes by making use of 
Belief propagation on the graphical models. 



Nevertheless there exist such extensive studies, 
in the present study, we choose a Gibbs distribution 
which is specified by a single parameter, namely, tem- 
perature T and learns the distribution (the effective 
temperature) from the gene configurations produced 
by GA. Thus, we attempt to figure out the average- 
case performance of GA from the view point of tem- 
perature scheduling in simulated annealing. More- 
over, the evaluation of average-case performance is 
partially carried out analytically by choosing the en- 
ergy function of solvable spin glass models. These 
points are remarkable distinctions of our approach in 
the present paper. 



This paper is organized as follows. In the next 
section, we mention the relationship between the 
GA and simulated annealing from the view point of 
the distribution of ensembles (population) on Marko- 
vian process. In section 3, we explain our formu- 
lation and tools to investigate the average-case per- 
formance of the GA. We construct the Boltzmann- 
machine-type learning equation via the minimization 
of Kullback-Leibler information between the empiri- 
cal distribution of GA and a Gibbs distribution with 
respect to the effective temperature. The validity of 
a Gibbs form of the distribution is confirmed by the 
so-called Holland's condition. The learning equa- 
tion is rewritten in terms of optimization of the en- 
ergy function for Ising spin systems. The concept of 
average-case performance is mentioned, namely, the 
so-called self-averaging properties for physical quan- 
tities and the replica method to carry out the average 
are introduced. In the next section 4, we introduce 
our benchmark test problem, namely, the combina- 
torial optimization problem having the energy func- 
tion of the so-called spin glasses. The mathematically 
tractable spin glasses, namely, spin glass chain and 
the Sherrington-Kirkpatrick model are introduced and 
their statistical properties are revealed. In section 5, 
we explain the set-up of our numerical experiments 
and the results are reported in the next section 6. The 
last section is devoted to concluding remarks. 



2 GA and SA 



As we mentioned, in this paper, we consider the 
statistical properties of GA from the view point of 
thermodynamics. In simple GA, we define each gene 
configuration (member) by a string of binary vari- 
ables with lengths, that is, s = (s\,S2,--- ,Sp/),Si £ 
{— 1,+1}, and we attempt to make each configura- 
tion in ensemble with size M to the state which gives 
a minimum of the energy function H(s), say, s*D The 
problem is systematically solved by GA if the sys- 
tem evolves according to a Markovian process and the 
gene distribution Pq\ (s) at time (generation) t might 



converge as Pq A {s) — » Pq A {s) and we have 



(-), 



(=i 



(i) 



On the other hand, one of the effective heuristics 
which is well-known as Simulated Annealing (SA) 
( |Kirkpatrick et al., 1983] |Geman and Geman, 1984) 
is achieved by inhomogeneous Markovian process. 
The process is realized by Markov chain Monte 
Carlo method (MCMC) which leads to an equilib- 
rium Gibbs distribution at temperature T = p -1 (from 
now on, the p is referred to as 'inverse temperature'), 
namely, 



-pW#(, 



(2) 



In SA, the temperature is scheduled very slowly in 
time as pM — > °° (T" — > 0), and then, we can solve 
the problem as 



Pf>(s) = 5(s-s* 



N 



(3) 



Therefore, both the GA and the SA share a concept 
to make the distribution convergence to a single (or 
several) delta-peak(s) at the solution(s). However, in 
general, the Markovian (dynamical) process of GA is 
very hard to treat mathematically due to the global 
transition between the states by the crossover or, espe- 
cially, the mutation operator, whereas the SA causes 
only local transitions between the states. From the 
view point of EDA, the dynamics of GA should lead 
to an empirical distribution of states. As we shall 
mention later on, the distribution is more likely to 
be a Gibbs one and it might be reasonable approach 
to grasp the shape through the Gibbs form (effective 
temperature) of the distribution. 



3 Formulation and tools 



3.1.1 The Holland's condition 



In this section, we explain our formulation and 
several tools to evaluate the average-case performance 
of GA through the effective temperature scheduling of 
the Gibbs distribution that is trained from gene con- 
figurations of simple GA. 

3.1 Kullback-Leibler information 

We start our argument from the distance between an 

empirical distribution from GA dynamics Pg\(s) and 

a Gibbs distribution P^ (s) at the effective tempera- 
ture T — P . The distance is measured by the fol- 
lowing Kullback-Leibler information (KL) 

KL(P GA \\P B ) = £P GA (,) log | | (4) 

where the summation with respect to all possible gene 
configurations s — (si, — - ,sn) is defined by 



EG 



E (• 



(5) 



In this paper, we represent each component of gene 
configurations by s; = ±1 instead of sj = 0, 1 because 
we choose the cost function of spin glasses to be 
minimized as a benchmark test later on. The 'spin' 
here means a tiny magnet in atomic scale-length and 
Sj = +1 stands for 'up-spin' and vice versa. We 
should keep in mind that the above distance is depen- 
dent on the inverse temperature p. Thus, we obtain the 
following Boltzmann-machine-type learning equation 
with respect to p as 



dt 



dKLjP^WP^) 

ap 



(6) 



We naturally expect that the effective temperature 
evolves so as to minimize the KL information for each 
time step. When both distributions become identical 

one in the limit of t — >• °°, namely, P^(s) = Pg°\s), 
we obtain 



dP 



Y t = E^(*H^n*)/aP}/^n*) 

= (3/3p)^S(*-a*) = 3a/3p = (7) 

s 

and the time evolution of inverse-temperature then 
stops. We should notice that a = L s 5(s — s *) is the 
number of degeneracy at the lowest energy states. 



Before we examine the time-dependence of the effec- 
tive temperature p, we comment on the validity of the 
choice of a Gibbs form as the distribution. John Hol- 
land mentioned that the algorithm might be effective 
if the probability P{pl ,t) = Eie.?r that a schema 
!tf appears at generation (time step) t follows the fol- 
lowing condition as a kind of 'Master equation' of 
probabilistic flow: 



dt 



= f{X,t)-P{X,t)f{j,t) (8) 



where J stands for an arbitrary schema which is 
different from the 9i and p,(f) is a probability 
that a gene configuration i appears at generation t 
( |HHolland, 1915) . f[p(,t) denotes the average fit- 
ness of the schema H at generation t : 



(9) 



The above equation means that the probability that a 
H appears increases proportional to the average fit- 
ness value of H and it also decreases proportional to 
the average fitness values f(j,t) =^iej43{ s{i)Pi{t)- 
One can easily show that the above condition is 
satisfied by a Gibbs distribution having the form: 



Pi(t) 



exp[Prg(Q] 
Ljej ex P[Prg(./')] ' 



(10) 



For simplicity, we assume that the inverse tempera- 
ture increases linearly in time as p, = t. Then, the 
above ( fTOb leads to 



Pi(t) 



e*p[fg(Q] 



(11) 



Taking the derivative of both sides of the above equa- 
tion with respect to t, we have 

d Pi (t) _ g(i) e«0 (Zjv t?*W ) - e'i'(') Z je , g(j)e^ 



dt 



= P,(t)g(i)-p,(t)Y.gU)Pj( t )- ( 12 ) 

Taking the derivative of P(tt ,t) = Eiew Pi{* ) w hh re- 
spect to t and substituting the above ( TTZI ) into the right 



hand side of the equation, we obtain 



©, the learning equation leads to 



dP(H,t) 
dt 



dpijt) 
dt 



= £ \ Pi(t)g(i)-Pi(t) Y,s{j)pj{t) \ 

i'e:tf left jE3 

= f(X,t)-P(X,t)f(j,t) (13) 

where we used the definition (0 of average fitness of 
the schema H at generation t . This equation is noth- 
ing but the Holland's condition dS). This result means 
that the empirical distribution of genes which are gen- 
erated by GA dynamics is more likely to be a Gibbs 
distribution or can be well-approximated by a Gibbs 
distribution specified by the inverse temperature p if 
the GA effectively finds the solution for a given opti- 
mization problem. This fact provides us a justification 
of the present approach to make a Gibbs distribution 
learns from the GA dynamics. 



3.2 Learning equation for spin systems 



In the previous section, we formulated the learning 
equation for general problems and discussed some 
key properties including the Holland's condition in 
the formulation. Here we attempt to restrict ourselves 
to more particular problems, namely, we deal with a 
class of combinatorial optimization problems whose 
cost functions are described by the energy function of 
Ising model. 

We first reformulate the equation ® by means of 
Ising spin systems having the energy function H(s) = 
— 'EijJijSiSj. For the case of positive constant spin- 
spin interaction /y — J > 0, V,-j, the lowest energy 
state is apparently given by Si = +1, V,- (all-up spins) 
or Si = — 1, V; (all-down spins). However, as we 
shall see in the following sections, for the case of 
randomly distributed (the ± sign is also random), 
the lowest energy state is highly degenerated and it 
becomes very hard to find the state. It should be 
noted that the traveling salesman problem (TSP) (see 
e.g. (Me zard and Parisi, 1986| l) or the ^-satisfiability 
problem (£-SAT) (see e.g. ( Monasson etal., 1999| l) is 
rewritten in terms of optimization problems described 
by the variant of the above energy function of spin 
glasses. 

Substituting the corresponding Gibbs distribution 
Pb(s) = exp[— p\ff(s)]/£ s exp[— P#(*)] into equation 



Ls (Lij JjjfjS j ) exp [g Zij JijSjS j] 



(14) 



where the second term appearing in the right hand 
side of the above equation is internal energy of 
the system described by the Hamiltonian H(s) = 
~LijJij s i s j at temperature T = p~', whereas the first 
term is the energy H(s) averaged over the empirical 
distribution Pqa (*) of GA. Then, we immediately find 
that the condition 



s 'j s ij 

I*ex.p\$ZijJijSiSj] 



(15) 



yields d$/dt = for P GA (s) = P B {s). 

In general, it is very hard to calculate the internal 
energy of the spin system 

because 2 N sums for all possible configurations in 
• • ) are needed to evaluate the E({J} : p), where 
we defined a set of interactions by 



{J} = {J u \i,j=l,--- ,N}. 



(17) 



To overcome this difficulty, we usually use the so- 
called Markov chain Monte Carlo (MCMC) method 
to calculate the expectation ( TTol l by important sam- 
pling from the Gibbs distribution at temperature T = 

On the other hand, the first term appearing in the 
right hand side of (TBI , we evaluate the expectation by 
making use of 

U G a({J}) = -Y,Pga(s) I £/ W J 



= -^LlLVKMMM)) (18) 



1=1 \ ij 



where Si(t,l) is the l-th sampling point at time t from 
the empirical distribution of GA. Namely, we shall 
replace the expectation of the cost function H(s) = 
~Y.ijJij s i s j over tne distribution Pga{s) by sampling 
from the empirical distribution of GA. 

By a simple transformation p — > T^ 1 in equation 
( TPfl i. we obtain the Boltzmann-machine-type learning 



equation with respect to effective temperature T as 
follows. 

J = -T 2 (U({J}:T- l )-U GA ({J})) (19) 

From this learning equation, we find that time- 
evolution of effective temperature depends on the dif- 
ference between the expectations of the cost function 
over the Gibbs distribution at temperature T and the 
empirical distribution of GA. 

Obviously, the performance of GA is now eval- 
uated through the 'annealing schedule' of effective 
temperature T, however, the schedule depends on the 
choice of interactions between spins, that is, {/}. 
Therefore, we should average the learning equation 
( fT9b over the such problem-dependent 'input data', 
that is to say, {/}. 

3.3 Average-case performance 

As we mentioned, the difficulties of finding the lowest 
energy states depend on the weights between spins, 
namely, the problem is dependent on the statistical 
properties of interactions {/}. Obviously, the learn- 
ing equation ( TT9l > and its time evolution for a finite 
size system depends on the choice of {/}. Hence, the 
GA which is applied to some specific problem having 
a set of {J} might give an excellent solution as a pe- 
culiar case and the reverse might be also true (the GA 
might give a poor solution as another peculiar case). 
Therefore, we should evaluate the 'average-case per- 
formance' of the learning equation which is indepen- 
dent of the realization of 'problem' {/}. Namely, one 
should evaluate the 'data-averaged' learning equation 

^ = -T 2 (E [J} (U({J} : 7- 1 )) -E W (U GA ({J}))) 

(20) 

to discuss the average-case performance, where we 
defined the average E{/j (• • • ) by 

K{j } (---)^Uf d M---)nJij)- (2D 

ij 

We should keep in mind that in this paper we deal 
with the problem in which each interaction /y has no 
correlation with the others, namely, 

E {J} (J ij J kl )=j\ k 8 Ll (22) 

where we defined J 2 as a variance of P{Jij) and b x . y 
stands for a Kronecker's delta. 

3.3.1 Self-averaging of physical quantities 

In order to carry out the performance evaluation, we 
need to calculate the average of equation (fT9] i over the 



probability of realization {/}, that is, 




r 2 E {7} (t/({7}:p)). (23) 

In statistical physics of disordered spin systems, the 
probability that an arbitrary state x having the energy 
U x appears is given by P x — exp[— pf/J/Z where a 
normalization factor Z is referred to as partition func- 
tion 

Z = £exp[-pC/J. (24) 

X 

Then, the internal energy defined by 

= I, T £/,exp[-pU] = £ Ux ■D(U x )U x exp[-VU x } 

L,exp[-pf/,.] ^©(t/OexphpU] ' 

(25) 

where £> (U x ) stands for a density of state having the 
energy U x , is obtained from the free energy 

F = -T\ogZ (26) 

by using the following relation 

U=^(fiF). (27) 

To use the relationship between the internal and free 
energies, one can rewritten the second term appearing 
in the right hand side of the above equation d23l as 

E {J] (U({J} : P)) = A (PE {7} (F({/} : P))) . (28) 

In statistical physics of disordered spin systems, it is 
well-known that the quantities such as free energy are 
independent of the choice of {/} in the large system 
size limit N — > °°. In other words, the free energy 
calculated for a given realization of {/} is identical to 
the average over the probability P({J}) = WijP{Jij), 
namely, the identity 

Urn F({/} arealization : p) = E {7} (F({J} : p)) (29) 

holds. The mathematically rigorous proof for the 
Sherrington-Kirkpatrick model is given elsewhere 
(see e.g. (Talagrand, 2003)). Thus, we calculate the 
right hand side of d291l for mathematically solvable 
model, whereas we evaluate the left hand side by 
computer simulations for the other models. However, 
as we mentioned above, the both procedures to eval- 
uate the average give the same results in the limit of 
N — s- °o. 

3.3.2 The replica method 

Here we encounter a technical problem in the eval- 
uation of the average. As we mentioned, we should 



evaluate the average such as E{jj (•••), namely, the 
quantity to be evaluated is now written as follows. 



U = E 



{■/} 



|(PF))=|(PE W (F)) 



E {/} log £exp[-|3#(* :{/})] (30) 



Unfortunately, it is very difficult for us to carry out 
the above calculation except for a few limited cases 
because the variables {/} appear in the logarithm of 
the partition function. Then, by making use of the 
identity: logZ = (Z" — l)/« which holds in the limit 
of n — > 0, we calculate the average as 



£ {/} (logZ) = lim 



%}(Z")- 



1 



-{■>} 



(n2=iE..e-P£.^ 8 :W))_i 



(31) 



= lim ■ 

n->0 

where we replaced the average of logZ, namely, 
E{yj(logZ) with the average of Z", that is 
E{y}(Z") by introducing the n-replicas (copies) a = 
1,2, •••,«. This procedure to calculate the aver- 
age of self-averaging quantities is referred to as 



replica method ([Sherrington and Kirkpatrick, 1975 



Mez ardet al., 1987| l. In the evaluation of the learn 
ing equation for the problem having the cost function 
of the Sherrington-Kirkpatrick-type, we shall use this 
technique. 



4 Mathematically tractable models 

In this section, we introduce two kinds of spin 
glass model which will be used as a benchmark cost 
function to be minimized by GA. These models are 
very simple, however, several quantities such as inter- 
nal energy as a function of temperature are obtained 
analytically and very suitable for us to examine the 
average-case performance of GA as a benchmark test. 
The models dealt with are given as follows. 

• Spin glass chain 

It is one-dimensional spin glass model having 
only nearest neighboring interactionsD It is pos- 
sible for us to investigate the temperature depen- 
dence of internal energy and moreover, one can 
obtain the lowest energy exactly. The energy 
function (Hamiltonian in the literature of statis- 
tical physics) is given by 



H 



N 



7; = ^ (0,1) (32) 



where 7, stands for the interaction between spins 
Si and Si+i. 5\£(a,fr) denotes a normal Gaussian 



H/N o 



i f, 



!' 1 



Figure 1: Typical energy landscape H(s) = — Y,iJi s i s i+l 
with P(Jj) = 5\£ (0, 1), E(JiJj) = 8y of the spin glass chain. 
The number of spins is N = 10. It should be noted that 
the horizontal axis S denotes the label of states, that is, 
S = 1,2,--- ,2 N (= 1028). For instance, 5=1 stands for a 
state, say, s(S= 1) = (+1. + 1,-- - , + 1) and5 = 2' v denotes 
*(S = 2*) = (-1,-1,...,-1). 



distribution with mean a variance bD We plot the 
typical energy landscape in Figure Q] From this 
figure, we find that the structure of the energy sur- 
face is complicated and it seems to be difficult for 
us to find the lowest energy state. 

However, we should notice that in d32i > s, takes 
±1 and the product also has a value ±1. 

Hence, we introduce the new variable T, which is 
defined by Tj = then Tj takes T; E {1,-1}. 

Therefore, in order to minimize H(l) — — 
we should determine T, = sgn(7,) for each i 
and then, we have the lowest energy as t/ m j n = 
— HiJi s E n (Ji) — —Hi \Ji\- Namely, when/,- obeys 
a Gaussian with mean Jq and variance J 2 , the low- 
est energy for a single spin is obtained in the ther- 
modynamic limit — s- °° as 



! . U>. |-ni r 

hm 



= %}(W) = 



dJi 



Vi-JoY 



2nJ 



\Ji\ 



-Jo 



[2 -4 
V 7C 



where E^( - • • ) here stands for the average over 
the configuration {/} = (J\ , • • • ,Jn). 

Thus, for the choice of (Jo, J) = ( 1 , 0), namely, 
in the limit of the ferromagnetic Ising model, we 
have the lowest energy as U m j n /N = — 1 (all spins 
align in the same direction), On the other hand, 
for the choice of (Jo, J) = (0, 1), we have £/ m in = 
— \j2j%. These facts mean that the lowest energy 
changes according to the value of ratio Jo/ J. 




Figure 2: Internal energy of spin glass chain as a func- 
tion of temperature. The solid line is exact result U = 



-p7~ 



Dx 



: , whereas the dots denote the internal energy 



' cosh" p.v ' 

calculated by the MCMC for N = 3000. The error-bars are 
calculated by 10-independent runs for different choice of 
the {J} = {Ji\i : = 1, • • • , N}. The inset indicates the l/ m j n as 
a function of Jq. We set / = 1. 



We next consider the case of finite temperature 
(P < °°). For this case internal energy per spin 
is given by 



N 



l 1 m^=E m ((i/) x ) = -^logi:e P ^ 

(33) 

(34) 



with 



I T e X p[pL-7/X ; ] 



where we defined 



•) 



(35) 



= E I (• 

X X,=±l Tjv=±1 

and the partition function Z x = £ T ef s £' ,/ ' x ' is now 
calculated as {2cosh(p7,)} Ar . Hence, we have 
the average free energy density defined by / = 
limA^oo(logZ/AO =i\r 1 E {/} (logZ) is evaluated 
as follows (the self-averaging property we men- 
tioned before was assumed). 



dJi 



>2%J 
Dxlog2coshp(7o 



log2cosh(p7,) 



(36) 



where we defined Dx = dxe~ x2 / 2 /\/2%. From the 
above result, we immediately obtain the internal 
energy per spin U = — 3//3p by 



U = -Jq 
P^ 2 



-Jx) 



Dxtanhp(7 
Dx 

• cosh 2 p(7 +^) 



(37) 



Especially, for the case of (Jo, J) = (0, 1), we have 

Dx 



U 



P 



cosh Px 



(38) 



In Figure [2] we show the U as a function of T. 
From the arguments we provided above, we have 
the following learning equation d39l > for the spin 
glass chain whose Hamiltonian is given by ( l32l is 
now rewritten as 



dT 
dt 



t2 }^iZ(zJrttj)si+i(t^ 



Dx 

-°o cosh 2 T~ x x 



(39) 



• Sherrington-Kirkpatrick model 

This model is a spin glass model in which each 
spin is located on a complete graph. For this 
model, the energy function is explicitly given by 



1 N 

H = -^LL J 'J s < s j 



(40) 



where Jy obeys P{Jij) — !Jt(Jo,J 2 ). We plot the 



H/N o 



Fi gure 3: Typical energy landscape H(s) — —^LijJij s i s j 
with P(Jij) = 9t (0, 1), E(J u J k i) = 8 Lk 8jj of the SK model. 
The number of spins is N = 10. It should be noted that 
the horizontal axis S denotes the label of states, that is, 
S = 1,2, • • • ,2 N (= 1028). For instance, 5=1 stands for a 
state, say, s(S = 1) = (+1, + 1,--- , + 1) and5 = 2 A ' denotes 
s(S = 2 N ) = (-l, -I, ■■■-!). 



typical energy landscape in Figure [3] At first 
glance, it seems that the structure of energy sur- 
face is very similar to that of the spin glass chain, 
however, finding the lowest energy state of the 
SK model needs much more difficult tasks. This 
'ground state problem' in the SK model is one of 
the non-trivial issues in the research field of spin 
glasses. 



By using the replica method we mentioned in the 
previous section, the averaged internal energy per 
spin, namely, the second term appearing in the 
learning equation (f39t is calculated as 



U _ ... , JO 2 V* 

— = Up{m,q) = -—m - — 



(l-q 2 ) (41) 



where, m,q are the replica symmetric solution for 
the magnetization and the spin glass order param- 
eter, respectively. These are explicitly given by 
the following equations of state 



1 



Dz tmh$(Jz*/q+Jotn) 
(42) 

Dz tanh 2 $(Jz^/q+Jo>n) 
(43) 



where we defined 



)e*p[(fi/*f)l4jSiSj] 



(44) 



x ' E,exp[(p/JV)E - S/S> ] 

and Em(' • • ) by (fJTJ. For these solutions for a 
given temperature T, the learning equation for the 
SK model is obtained by 



dT 
dt 



= r 



lim — 

l-^NL 



- T 2 Up(m(T),q(T)). (45) 

Here we should keep in mind that in the limit of 
P — > oo, tanhp(- ••) = sgn(- ••), # = 1 is derived 
from d43l . 

On the other hand, from ( l42l . the magnetization 
to leads to 



Dzsgn(7z + 7o»j) = 1 — 2erfcc 



Jo 

— to 
J 



where we defined erfcc(x) by 

dz 



erfcc(x) 



"T 
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(46) 



(47) 



By utilizing the asymptotic form 2erfcc(x) ~ 1 — 
xy/2/% around x ~ 0, we obtain the critical point 
a = y/2/% (Jo /J) = 1 below which the spin glass 
phase emerges. Hence, the lowest energy at zero 
temperature is obtained by substituting the solu- 
tion m of (|46| | into the expression of internal en- 
ergy (HTb with q = 1, namely, U /N = —Jom 2 /2. 
As a special case, the lowest energy in the fer- 
romagnetic state Jq I J — >• oo is given by t/ //o^V = 
— 1/2. In Figure |4] we plot the internal energies 
per spin scaled as yj2/%(U /JN) = —am 2 /2 and 
U /JqN — —m 2 /2 and magnetization m as a func- 
tion of a (= i/2/% (Jo /J)) at J = 0. 




Figure 4: Parameter a (= ^/2jn(Jo /J ) )-dependence of 
magnetization m, internal energies per spin scaled as 
j2f%(U/JN) = -am 1 jl and f//7 ^ = -m 2 /2. 



For these two mathematically tractable models, we 
shall evaluate the learning equations for effective 
Gibbs distributions in the next section. 



5 Set-up for numerical experiments 

For these two kinds of the solvable spin glass 
models, we examine the learning equations (T39l>(f45T> 
through the time-dependence of the effective temper- 
ature TD In following, we explain our setting of pa- 
rameters which control simple GA to be utilized in 
the learning processes. 

• The number of spins N 

This is the number of components in a single 
gene configuration and is regarded as the num- 
ber of spins in the spin glass model. Here we set 
N = 2000 for spin glass chain and N = 500 for the 
SK model. 

• The number of ensembles (population) M 

The number of population in GA. We set M = 100 

• Parameters appearing in GA 

- o: The number of members in selection of tour- 
nament -type at each generation. 

- p c '. The rate for a single point crossover 

- p m : The mutation rate 

• Effective temperature T 

A control parameter of the Gibbs distribution to 
approximate the empirical distribution of GA. We 
set the initial value T = To (< 

• On the selection 

In our numerical experiments, we generate the 



configurations (members) with length N ran- 
domly and for each of the member, we evaluate 
the fitness values. Then, we pick up o members 
among the population (ensemble) with size M and 
select the largest fitness member and the others are 
discarded. We repeat the process up to M times. 
As another candidate of selection, we might 
use the method to weight each member a 



of population with p a = e P ,£a /£; 



,-P.,£a 



(see e.g. ( Prugel-Bennett and L.Shapiro, 1994 



|Prugel-Bennett and L.Shapiro, 1997) i). Obvi- 
ously, (3i — > °° limit yields the case in which only 
the best solution is selected. Hence, the case 
a = M for our selection rule is identical to the 
Pi — > °° limit. On the other hand, p% = limit 
corresponds to o = 1, namely, each member is se- 
lected randomly. 



6 Results of numerical experiments 

According to the set-up explained in the previous 
section, we shall carry out the numerical experiments 
for two mathematically tractable models. The results 
are summed up below. 

6.1 Spin glass chain 

We first show the time-evolution of effective tem- 
perature and the residual energy for the case of spin 
glass chain with parameter sets: a = 2,p c = 0.1, p m = 
0.001 in Figure [5] From this figure, we find that the 
asymptotic behaviour of the effective temperature fol- 
lows a power-law. This schedule is faster than the ef- 
fective temperature scheduling for the optimal simu- 
lated annealing ~ l/log(l +t), however, slower than 
the exponential decreasing. Thus, here we define the 
residual energy and its time-dependence as the differ- 
ence between the lowest energy and current energy 
obtained by the GA dynamics. We find that the resid- 
ual energy which is defined by 

(48) 



e=H(s) -mini/ (s) 



also asymptotically goes to zero and it follows a 
power-law in the scaling regime t ^> 1. 

To investigate the effect of the selection operator 
on the GA dynamics, we carry out the same numerical 
experiments for the case of a = 1, namely, we inves- 
tigate the time-evolution of the effective temperature 
for the GA without any effective selection (leading up 
to 'random selection'). We plot the result in Figure|6] 
From this figure, we find that the effective tempera- 
ture does not decrease and remains the same value as 
the initial condition. This means that the behaviour of 



log(t) 



200 400 600 



1000 1200 1400 



-0.2 




\ -0.4 

\ -0.6 
\ t»J 

\ "55 -0-8 
\ Q 

\ 12 

\ -1.4 






) 1 2 3 4 5 6 7 

log(t) 





Figure 5: Time evolution of the effective temperature (up- 
per panel) and the residual energy defined by d48b (lower 
panel) for the case of spin glass chain. We used a simple 
GA having a = 2,p c =0.1,p m = 0.001. The inset stands 
for the asymptotic behaviour. 



the effective temperature is strongly dependent on the 
selectionD 

We next consider the relationship between the 
time-evolution of effective temperature, residual en- 
ergy and the values of parameters for GA operations 
during the dynamics. We first fix p c — 0.1, p m = 
0.001 and evaluate the result by changing the parame- 
ter a as o = 2, 3 and 4. The result is shown in Figure[7] 
From these panels, we find that the speed of effective 
temperature decreasing for large a value is faster than 
the result for small a value. However, in the asymp- 
totic regime, the behaviour of effective temperature is 
almost independent of the choice of a valueD 

We next consider the case of p m = 0.0001 , 0.0005 
and 0.001 keeping o = 2 and p c = 0.1. The result 
is shown in Figure [8] From this figure, we con- 
firm that the speed of convergence becomes very slow 
for both initial stage and asymptotic regime of the 
dynamics for p m = 0.001. This result implies that 



t 



Figure 6: Time evolution of the effective temperature for 
the case of spin glass chain by simple GA having p c = 
0.1, p m = 0.001 and without any selection operation a = 1. 



'mixing' among the gene configurations is enhanced 
for large p m so as to prevent the Gibbs distribution 
from converging. On the other hand, for the case of 
p m = 0.0005 in the asymptotic regime, the speed of 
convergence is not so slow although the speed in the 
initial stage is actually slow. We also find this result 
from the behaviour of the residual energy. The result 
for p,„ — 0.0001 gives the largest exponent t, of the 
asymptotic form 



T(t) = t-*, (r»l), 



(49) 



namely, the speed of convergence is the fastest among 
the three casesD From the observation above, we find 
that mutation in a simple GA makes the population 
diverse to prevent us from trapping in a local minima 
of energy function and one can enhanced the speed of 
convergence asymptotically by setting the parameter 
p m to an appropriate value. 

Finally, we investigate the time-evolution of ef- 
fective temperature for p c = 1.0,0.5 and 0.1 keeping 
a = 2 and p m = 0.001. The result is shown in Figure 
[9] From this figure, we find that higher value of the 
crossover rate gives higher convergence of the effec- 
tive temperature. Generally speaking, a crossover is 
one of the essential operators in GA to generate genes 
having good quality in terms of minimization of the 
cost. However, at the same time, one has some risks to 
destruct the good equality gene itself when we choose 
too large crossover rate. The cost function we deal 
with in this section is that of the spin glass chain and 
interactions among the spins exist only in the nearest 
neighboring spin pairs. This fact means that there is 
less possibility that the crossover deconstructs the fine 
genes in comparison with the case of the Sherrington- 
Kirkpatrick model which will be mentioned in the 
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Figure 7: Time evolution of effective temperature and resid- 
ual energy defined by d48b for the case of spin glass chain. 
We used a simple GA specified by a = 2,3,4 keeping 
p c = 0.1 and p m = 0.001. 



next subsection. Actually, for the case of p c = 1 .0, the 
GA gives the best performance among the three cases 
p c — 1.0,0.5 and 0.1. Nevertheless, in the asymptotic 
regime, the three cases gives almost the same perfor- 
mance. 

6.2 Sherrington- Kirkpatrick model 

We next consider the case of the SK spin glass. In the 
SK model, it is difficult for us to obtain the exact low- 
est energy to evaluate the residual energy. Hence, here 
we investigate the time evolution of effective temper- 
ature and the average fitness which is defined as neg- 
ative internal energy —U = —H(s). 

As we discussed in the previous subsection, we 
first investigate the time-evolution of these two phys- 
ical quantities for the case of o = 2,3 and 4 keeping 
p c = 0.05, p m = 0.005. We show the result in Figure 
[TOl From these panels, we find that the asymptotic 
performance through the effective temperature does 
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Figure 8: Time evolution of the effective temperature (upper 
panel) and the residual energy defined by d48b (lower panel) 
for the case of spin glass chain. We utilized a simple GA 
having p m = 0.0005,0.001 and 0.005 keeping p c = 0.1 and 
a = 2. 



not change even if we increases the a value. However, 
it should be noticed that some 'crossover phenom- 
ena' takes place in some generation (time) regime. 
Namely, in this generation regime, the exponent £, in a 
power-law changes to the different exponent \ (> "Q. 
On the other hand, at the beginning of the evolution, 
the average fitness value increases as the a value in- 
creases. 

We next consider the case of p m = 0.005,0.001 
keeping s — 2 and p c = 0.05. The results are shown in 
FigureQl] From this figure, we confirm that the speed 
of convergence for the case of p m = 0.005 slows down 
in the asymptotic regime whereas the speed for the 
case of p m = 0.001 remains. The same behaviour 
as time evolution of the effective temperature is ob- 
served in the lower panel of Figure [TT] 

Finally, we shall show the results for p c = 
0.1,0.05 and 0.01 keeping o = 2 and p m = 0.005 in 
Figure [12] As the SK model is defined on a complete 
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Figure 9: Time evolution of the effective temperature (upper 
panel) and the residual energy defined by d48t (lower panel) 
for the case of spin glass chain. We utilized a simple GA 
specified by p c = 1,0.5,0.1 keeping p m = 0.001 and a = 2. 



graph and all spins are connected, the crossover op- 
eration might destroy the gene configurations having 
relatively high fitness values. However, from the re- 
sults shown in this figure, the average fitness value in- 
creases as the p c increases although the effective tem- 
perature does not change so much. 



7 Concluding remarks 

In this paper, we introduced a learning algorithm 
of Gibbs distributions from training sets which are 
gene strings generated by GA to figure out the sta- 
tistical properties of GA from the view point of ther- 
modynamics. A procedure of average-case perfor- 
mance evaluation for genetic algorithms was numer- 
ically examined. The formulation was applied to the 
solvable probabilistic models having multi-valley en- 
ergy landscapes, namely, the spin glass chain and the 
Sherrington-Kirkpatrick model. By using computer 




Figure 10: Time evolution of the effective temperature (up- 
per panel) and the averaged fitness which is defined as neg- 
ative internal energy — 1/ (lower panel) for the case of SK 
model. We used a simple GA having a = 2, 3 and 4 keeping 
p m = 0.005 and p c = 0.05. In the asymptotic regime / ^> 1 
of time-evolution of temperature, 'crossover phenomena' 
are observed. Namely, the power-law exponent \ changes 
to the different value at intermediate time scale log/ ~ 5. 



Figure 1 1 : Time evolution of the effective temperature (up- 
per panel) and the averaged fitness which is defined as neg- 
ative internal energy — U (lower panel) for the case of SK 
model. We used a simple GA having p m = 0.005,0.001 
keeping p c = 0.05 and a = 2. In the asymptotic regime 
f > 1 of time-evolution of temperature for p m = 0.005, 
'crossover phenomena' are observed. Namely, the power- 
law exponent \ changes to the different value at intermedi- 
ate time scale log/ ~ 5. 



simulations, we discussed the asymptotic behaviour 
of the effective temperature scheduling and the resid- 
ual energy induced by the GA dynamics. 

Both effective temperature and residual energy 
show power-law behaviors given by d49| i, namely, 
p, = r» for t ^> 1, In section 2, we showed that a 
Gibbs distribution with p, = t yields the Holland's 
condition ([8]). Hence, it might be worth while for 
us to check to what extent the condition is modified 
for the Gibbs distribution with p ( = fi. For this case, 
a Gibbs distribution with respect to the gene config- 
uration having the fitness g(i) is written as Piif) = 
exppg(z)]/£; £J exp[r=g(i)]. Taking the derivative 
of the above equation with respect to f, we have 

d Pi (t)/dt = &- l {pi{t)g{i) - Pi(t)Lje, 8{i)Pj(t)}- 
Hence, by substituting this result into the equa- 



tion obtained by taking the derivative of the prob- 
ability P{tt ,t) that a schema H appears, namely, 
P(y{ ,t) = Ytiett Pi(f) w i tn respect to t, we obtained 
the modified Holland's condition as dP(y{ ,t) /dt = 
^ -1 {/(#,f) -P(9C,t)f(j,t)}. From this condi- 
tion, we find that the temporal difference of the prob- 
ability that the schema H appears increases by c^ -1 
for p r = £f 1 . More generally, we conclude that the 
temporal difference increases by d$ t /dt for P,. 

Although we dealt with the average-case perfor- 
mance evaluation just for a simple GA, our general 
procedure given in this paper is apparently applica- 
ble to the other sophisticated GAs based on any pop- 
ulation dynamics. Moreover, one can generalize the 
Gibbs form to be trained by Boltzmann-machine-type 
learning equation so as to include the so-called Tsallis 
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Figure 12: Time evolution of the effective temperature (up- 
per panel) and the averaged fitness which is defined as neg- 
ative internal energy — 1/ (lower panel) for the case of SK 
model. A simple GA having p c = 0.1,0.05,0.01 keeping 
p m = 0.005 and a = 2 is utilized. 



distribution, which is specified by p and q, as a special 
case fNishimori and Inoue, 1998] ). 
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